POEM: 1-Bit Point-Wise Operations Based on E-M for Point Cloud Processing
161
FIGURE 6.5
Illustration of training wj
i via Expectation-Maximization. We set a free constraint for the
weights obeying one specific distribution, i.e., which is lower than the minimum mean value
or higher than the maximum mean value. For the ones in the middle area (distribution not
transparent), we apply EM(·) to constrain it to converge to a specific distribution.
6.3.4
Optimization for POEM
In our POEM, what needs to be learned and updated are unbinarized weights wi, scale
factor αi and other parameters pi. These three kinds of filters are jointly learned. In each
Bi-FC layer, POEM sequentially updates unbinarized weights wi and scale factor αi. For
other layers, we directly update the parameters pi through backpropagation.
Updating wi via Expectation-Maximization: Given a conventional binarization frame-
work, it learns weights wi based on Eq. 6.44. δwi corresponding to wi is defined as
δwi = ∂LS
∂wi
+ λ∂LR
∂wi
(6.45)
wi ←wi −ηδwi,
(6.46)
where LS and LR are loss functions, and η is the learning rate. ∂LS
∂wi can be computed by
backpropagation, and, furthermore, we have
∂LR
∂wi
= (wi −αi ◦bwi) ◦αi.
(6.47)
However, this backpropagation process without the necessary constraint will result in
a Gaussian distribution of wi, which degrades the robustness of Bi-FCs as revealed in Eq.
6.80. Our POEM takes another learning objective as
arg min
wi
bwi −bwi+γ.
(6.48)
To learn Bi-FCs capable of overcoming this obstacle, we introduce the EM algorithm in
the update of wi. First, we assume that the ideal distribution of wi should be bimodal.
Assumption 6.3.1. For every unbinarized weight of the i-th 1-bit layer, i.e., ∀wj
i ∈wi, it
can be constrained to follow a Gaussian Mixture Model (GMM).